Bioinformatics (Thomas Dandekar, Meik Kunz)

7.1

The figure shows a selection. Each two bits corresponding to Shannon coding or

Shannon entropy are represented by a nucleotide. If you look at proteins, there are 20

amino acids encoded with 64 codons, i.e. 6 bits (because 2 to the power of 6 or 2**6 is 64).

The three-dimensional protein structure code is much more complex. There are so

many possibilities here that the information value of a defined protein structure is very

high (to be calculated in a simplified way by the number of bits that a PDB structure file

has when it is downloaded, which is already hundreds of thousands of bits). Informatically

clever is the use of internal coordinates to encode protein structures with few bits: Only the

path from one amino acid to the next is ever specified. This can be done with the angles

phi and psi at the central carbon atom (alpha-C atom) of each amino acid (AlQuraishi

2019). If I then use four or eight standard conformations to merely represent the protein

structure in a highly simplified way, I only need 2 or 3 bits for each amino acid position in

a protein folding simulation (Saxena et al. 1997).

Finally, there are other codes, for example at the cell membrane (membrane lipids, but

also specific membrane modifications), the RNA sequence structure code within the cell

for regulatory RNA, metabolic regulation (e.g. iron) as well as localisation in the cell, and

finally the sugar code at the cell surface, with which cells recognise each other and via

which transplant rejection is also coded. Finally, there are phospholipids that, for example

via gangliosides and cerebrosides (i.e. sugar-lipid structures), assign the wiring in the

brain and different neuronal structures to each other in detail in order to ensure the plastic

ity of our brain during embryology.

All these codes are not only used and needed in the cell, but you can also decode them

with bioinformatics, especially via sequence.

In this way, it is possible to translate the fairly universal genetic code (program

“Translate” from the Expert Protein Analysis System, EXPASY, at the “Swiss Institute of

Bioinformatics” https://web.expasy.org/translate/) and better understand its rarer variants

for certain codons, for example in mitochondria, some bacteria and also protozoa (Heaphy

et al. 2016) (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi). Similarly, sig

nals in regulatory RNA can be analyzed, for example with the RNA analyzer (https://rna

analyzer.bioapps.biozentrum.uni-wuerzburg.de/), but also, for example, sugar codes

(https://www.functionalglycomics.org/; https://ncfg.hms.harvard.edu/) or code analyses in

lipids, for example to assign lipids to the correct type after mass spectrometry (Ahmed

et al. 2015).

7.3

Understanding Coding Better

So what can we take away as insights? It’s a lot like a conversation in a busy pub. The

signals of the cell are constantly fighting against the background noise. Apart from our

own signalling cascade, which we are currently interested in, such as the Erk kinase

7.3 Understanding Coding Better